Project proposal Automatic extraction and evaluation of MWE: adapting method to French Language Technology: Research and Development
نویسنده
چکیده
Our project is based on the theme of Multi Word Expressions (MWE) we will focus on the problem of extraction. This task is important for improving lexical resources used for tasks such as tokenization, parsing or translation. In our study we will work on a French corpus. Our aim will be to not only select but also validate automatically which candidates are the true ones. If we have time we will apply our technique to a domain-specific corpora. In this second part we expect to extract not only MWE from the generalist language but also MWE specific to a domain.
منابع مشابه
English-Russian-Finnish Cross-Language Comparison of Phrasal Verb Translation Equivalents
A phraseological expression in a language may have equivalent expressions in other languages with different morpho-syntactic structures and semantic properties. Our recent experience in the Benedict Project (EU IST-2001-34237), in which a Finnish semantic lexicon compatible to the Lancaster English semantic lexicon (Rayson et al., 2004) has been built, shows that there can exist complex cross-l...
متن کاملAutomatic extraction and evaluation of MWE
This short paper aims at presenting a method for automatically extracting and evaluating MWE in the Europarl corpus. For this purpose we make use of mwetoolkit and utilize its output to find rules for the automatic evaluation of MWE. We then developed an XML parser to evaluate MWE candidates against those rules and also against online dictionaries. A sample of the results was manually evaluated...
متن کاملAutomatic Road Detection and Extraction From MultiSpectral Images Using a New Hierarchical Object-based Method
Road detection and Extraction is one of the most important issues in photogrammetry, remote sensing and machine vision. A great deal of research has been done in this area based on multispectral images, which are mostly relatively good results. In this paper, a novel automated and hierarchical object-based method for detecting and extracting of roads is proposed. This research is based on the M...
متن کاملEVALDA-CESART Project: Terminological Resources Acquisition Tools Evaluation Campaign
This paper describes the ongoing evaluation work in CESART research project supported by the French Ministry of Research and Technology and coordinated by the University of Lille 3 and ELDA. The project deals with the evaluation of term and semantic relation extraction from corpora in French. CESART logically follows on the evaluation project achieved within the framework of the Concerted Resea...
متن کاملAutomatic Knowledge Acquisition and Integration Technique: Application to Large Scale Taxonomy Extraction and Document Annotation
We present new results of our research on integration of ontologies created automatically by means of Human Language Technologies. The research is related to OLE (Ontology LEarning) – a project aimed at bottom-up generation and merging of ontologies. It utilises a proposal of expressive uncertain knowledge representation framework called ANUIC (Adaptive Net of Universally Interrelated Concepts)...
متن کامل